Modelando la serie de la temperatura

La serie de la temperatura fue extraída de la base de datos del IDEAM, se tuvieron en cuenta los promedios diarios de las temperaturas en grados centigrados (°C) registradas en BogotÔ en las diferentes estaciones meteorológicas que recolectaron información de esos días. La serie de tiempo cuenta con un total de 1826 registros, de los cuales 13 (0.7%) fueron imputados puesto que no se presentaba la información necesaria. Esta imputación fue realizada a partir del método de vecinos mÔs cercanos (KNN), donde se tuvierón en cuenta 5 vecinos.

Para el modelar la serie de la temperatura se usara inicialmente un modelo SARIMA, seguido de redes recurrentes simples, SRNN y finalmente GRU. Para seleccionar el mejor modelo se tomarƔ como criterio el error cuadrƔtico medio.

Red Neuronal Recurrente Simple

Importación de Datos

In [1]:
import numpy as np
import pandas as pd
import plotly.express as px
import matplotlib.pyplot as plt

import sklearn as sk
from sklearn import impute
from sklearn import preprocessing
import sklearn.externals
import joblib
from sklearn.model_selection import TimeSeriesSplit
from sklearn.impute import KNNImputer
import sklearn.preprocessing

from keras.models import Sequential
from tensorflow.keras.callbacks import EarlyStopping
import time
import sklearn.externals
import joblib
import plotly.graph_objects as go
from sklearn import metrics

import statsmodels.api as sm
import statsmodels.tsa.stattools as ts
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from statsmodels.tsa.statespace.sarimax import SARIMAX

import tensorflow as tf
import tensorflow.keras.layers as L
import tensorflow.keras.models as M
import tensorflow.keras.backend as K

%matplotlib inline
In [2]:
import plotly.io as pio
pio.renderers.default='notebook'
In [3]:
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
In [4]:
df = pd.read_csv("/content/drive/Shareddrives/Mineria /Temperatura1.csv", sep=';', header=0, decimal = ',')
Fecha = pd.date_range(start='2017-01-01', end='2021-12-31', freq='D')
df['Fecha'] = Fecha
df = df.set_index('Fecha')

print(df[pd.isnull(df.ValorObservado)])
print('En total hay' ,
      str(df['ValorObservado'].isnull().sum()) ,
      'valores sin información')
print('Correspondientes al {:.3f}% del total'
      .format(df['ValorObservado'].isnull().sum()*100/len(df)))
            ValorObservado
Fecha                     
2017-08-12             NaN
2017-12-24             NaN
2019-09-15             NaN
2019-09-16             NaN
2019-09-17             NaN
2020-11-12             NaN
2021-01-05             NaN
2021-01-06             NaN
2021-01-07             NaN
2021-01-08             NaN
2021-08-18             NaN
2021-08-20             NaN
2021-12-05             NaN
En total hay 13 valores sin información
Correspondientes al 0.712% del total

La serie presenta valores faltantes, por lo tantol se imputaran usando el método de vecinos mÔs cercanos (KNN), como se muestra a continuación.

Imputación a partir del vecino mÔs cercano

In [5]:
#Imputación de Valores usando el vecino mÔs cercano
imput = KNNImputer(n_neighbors=5, weights="uniform")

# Ajustamos el modelo e imputamos los missing values
imput.fit(df[['ValorObservado']])
df['ValorObservado'] = imput.transform(df[['ValorObservado']]).ravel()
print()
print("Valores pƩrdidos en ValorObservado: " , 
      str(df['ValorObservado'].isnull().sum()))
Valores pƩrdidos en ValorObservado:  0
In [6]:
fig = px.line(df, x=df.index, y="ValorObservado")
fig.update_xaxes(title_text="Fecha")
fig.show()

Separación de datos entrenamiento y validación

Para el respectivo anƔlisis se tomarƔn el 80% de los datos para entrenamiento y validacion, el 20% restantes para prueba, dichos valores corresponden a 1460 y 365 respectivamente.

In [7]:
from sklearn.preprocessing import MinMaxScaler
# crea el objeto  scaler y escala los datos
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(df.values)
#
df_norm = pd.DataFrame(scaled_data,index=df.index, columns=['ValorObservadoNormalizado'])
In [8]:
train_size = int(len(df_norm) * 0.8)
test_size = len(df_norm) - train_size
train, test = df_norm.iloc[0:train_size], df_norm.iloc[train_size:len(df_norm)]
len_train = len(train)
len_test = len(test)
print(len_train, len_test)
1460 366
In [9]:
def create_dataset(X, y, time_steps=1):
    # crea dos listas vacias para depositar los datos
    Xs, ys = [], []
    # el primer lote de datos empieza en la primera observación
    # y toma time_steps  datos.
    # Comienza a avanzar hacia adelante.
    for i in range(len(X) - time_steps):
        v = X.iloc[i:(i + time_steps)].values
        Xs.append(v)
        ys.append(y.iloc[i + time_steps])
    return np.array(Xs), np.array(ys)
In [10]:
time_steps = 50

# reshape to [samples, time_steps, n_features]

X_train, y_train = create_dataset(train, train, time_steps)
X_test, y_test = create_dataset(test, test, time_steps)

print("X_train.shape = ", X_train.shape)
print("y_train.shape = ", y_train.shape)
print("X_test.shape = ", X_test.shape)
print("y_test.shape = ", y_test.shape)
X_train.shape =  (1410, 50, 1)
y_train.shape =  (1410, 1)
X_test.shape =  (316, 50, 1)
y_test.shape =  (316, 1)
In [11]:
fig = px.line(df_norm, x=df.index, y='ValorObservadoNormalizado')
fig.update_xaxes(title_text="Fecha")
fig.update_yaxes(title_text="ValorObservadoNormalizado")
fig.show()

Modelo (1 paso adelante)

In [12]:
from sklearn.utils import shuffle
# shapes
inputs_shape = (X_train.shape[1], X_train.shape[2])
SRNN_output = 60

# layers
inputs = L.Input(inputs_shape)
srnn = L.SimpleRNN(units=SRNN_output, name='SRNN')(inputs)
outputs_SRNN = L.Dense(1)(srnn)

SRNN_model = Model(inputs=inputs, outputs=outputs_SRNN, name='series_SRNN_model')

# Compiling the RNN
SRNN_model.compile(loss="mean_squared_error", optimizer=Adam(0.001))
# Fitting to the training set
start = time.time()
SRNN = SRNN_model.fit(
    X_train,
    y_train,
    epochs=50,
    batch_size=16,
    validation_split=0.1,
    verbose=1,
    shuffle=False
)
print("compilation time : ", time.time() - start)
Epoch 1/50
80/80 [==============================] - 2s 14ms/step - loss: 0.0175 - val_loss: 0.0196
Epoch 2/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0143 - val_loss: 0.0058
Epoch 3/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0096 - val_loss: 0.0038
Epoch 4/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0072 - val_loss: 0.0037
Epoch 5/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0068 - val_loss: 0.0036
Epoch 6/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0065 - val_loss: 0.0036
Epoch 7/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0063 - val_loss: 0.0035
Epoch 8/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0061 - val_loss: 0.0034
Epoch 9/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0060 - val_loss: 0.0034
Epoch 10/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0059 - val_loss: 0.0034
Epoch 11/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0059 - val_loss: 0.0033
Epoch 12/50
80/80 [==============================] - 1s 12ms/step - loss: 0.0058 - val_loss: 0.0033
Epoch 13/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0058 - val_loss: 0.0033
Epoch 14/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0058 - val_loss: 0.0033
Epoch 15/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0058 - val_loss: 0.0033
Epoch 16/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0058 - val_loss: 0.0033
Epoch 17/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0057 - val_loss: 0.0033
Epoch 18/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0057 - val_loss: 0.0033
Epoch 19/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0057 - val_loss: 0.0033
Epoch 20/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0057 - val_loss: 0.0033
Epoch 21/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0057 - val_loss: 0.0033
Epoch 22/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0057 - val_loss: 0.0033
Epoch 23/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0057 - val_loss: 0.0033
Epoch 24/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0057 - val_loss: 0.0033
Epoch 25/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0057 - val_loss: 0.0033
Epoch 26/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0057 - val_loss: 0.0033
Epoch 27/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0057 - val_loss: 0.0033
Epoch 28/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0057 - val_loss: 0.0033
Epoch 29/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0057 - val_loss: 0.0033
Epoch 30/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0057 - val_loss: 0.0033
Epoch 31/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0057 - val_loss: 0.0033
Epoch 32/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0057 - val_loss: 0.0033
Epoch 33/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0057 - val_loss: 0.0033
Epoch 34/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0057 - val_loss: 0.0033
Epoch 35/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0057 - val_loss: 0.0033
Epoch 36/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0057 - val_loss: 0.0033
Epoch 37/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0057 - val_loss: 0.0033
Epoch 38/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0057 - val_loss: 0.0033
Epoch 39/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0057 - val_loss: 0.0033
Epoch 40/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0057 - val_loss: 0.0033
Epoch 41/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0057 - val_loss: 0.0033
Epoch 42/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0057 - val_loss: 0.0033
Epoch 43/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0057 - val_loss: 0.0033
Epoch 44/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0057 - val_loss: 0.0033
Epoch 45/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0057 - val_loss: 0.0033
Epoch 46/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0057 - val_loss: 0.0033
Epoch 47/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0057 - val_loss: 0.0033
Epoch 48/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0057 - val_loss: 0.0032
Epoch 49/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0057 - val_loss: 0.0032
Epoch 50/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0057 - val_loss: 0.0032
compilation time :  42.74334645271301
In [13]:
joblib.dump(SRNN_model,'/content/drive/Shareddrives/Mineria /SRNN')
Out[13]:
['/content/drive/Shareddrives/Mineria /SRNN']
In [14]:
SRNN_model.summary()
Model: "series_SRNN_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_1 (InputLayer)        [(None, 50, 1)]           0         
                                                                 
 SRNN (SimpleRNN)            (None, 60)                3720      
                                                                 
 dense (Dense)               (None, 1)                 61        
                                                                 
=================================================================
Total params: 3,781
Trainable params: 3,781
Non-trainable params: 0
_________________________________________________________________
In [15]:
SRNN_losses = pd.DataFrame(SRNN.history)
fig = px.line(SRNN_losses, x=SRNN_losses.index, y=["loss", "val_loss"])
fig.update_xaxes(title_text="Epoch")
fig.update_yaxes(title_text="Loss")
fig.show()

Predicciones (1 paso adelante)

In [16]:
SRNN_Predict = SRNN_model.predict(X_test)
SRNN_Predict = scaler.inverse_transform(SRNN_Predict)
10/10 [==============================] - 0s 5ms/step
In [17]:
# plot y_train, y_test, and testPredict using plotly
seq_len=50
fig = go.Figure()
fig.add_trace(
    go.Scatter(
        x=df.index[seq_len : len(y_train) + seq_len],
        y=scaler.inverse_transform(y_train).ravel(),
        mode="lines",
        name="Entrenamiento",
    )
)
fig.add_trace(
    go.Scatter(
        x=df.index[len(y_train) + seq_len :],
        y=scaler.inverse_transform(y_test).ravel(),
        mode="lines",
        name="Prueba",
    )
)
fig.add_trace(
    go.Scatter(
        x=df.index[len(y_train) + seq_len :],
        y=SRNN_Predict.ravel(),
        mode="lines",
        name="Predicción",
    )
)
fig.update_xaxes(title_text="Fecha")
fig.update_yaxes(title_text="ValorObservado")
fig.show()

Intervalos de Confianza (1 paso adelante)

In [18]:
def QuantileLoss(perc, delta=1e-4):
    perc = np.array(perc).reshape(-1)
    perc.sort()
    perc = perc.reshape(1, -1)
    def _qloss(y, pred):
        I = tf.cast(y <= pred, tf.float32)
        d = K.abs(y - pred)
        correction = I * (1 - perc) + (1 - I) * perc
        # huber loss
        huber_loss = K.sum(correction * tf.where(d <= delta, 0.5 * d ** 2 / delta, d - 0.5 * delta), -1)
        # order loss
        q_order_loss = K.sum(K.maximum(0.0, pred[:, :-1] - pred[:, 1:] + 1e-6), -1)
        return huber_loss + q_order_loss
    return _qloss
In [19]:
# quantiles
perc_points = [0.025, 0.975]

# shapes
inputs_shape = (X_train.shape[1], X_train.shape[2])
SRNN_output = 60

# layers
inputs = L.Input(inputs_shape)
qsrnn = L.SimpleRNN(units=SRNN_output, name='SRNN')(inputs)
qoutputs_SRNN = L.Dense(2)(qsrnn)

qSRNN_model = Model(inputs=inputs, outputs=qoutputs_SRNN, name='series_SRNN_model')

# Compiling the RNN
qSRNN_model.compile(Adam(0.001), loss=QuantileLoss(perc_points))
# Fitting to the training set
start = time.time()
qSRNN = qSRNN_model.fit(
    X_train,
    y_train,
    epochs=50,
    batch_size=16,
    validation_split=0.1,
    verbose=1,
    shuffle=False
)
print("compilation time : ", time.time() - start)
Epoch 1/50
80/80 [==============================] - 2s 13ms/step - loss: 0.0753 - val_loss: 0.0119
Epoch 2/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0147 - val_loss: 0.0108
Epoch 3/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0170 - val_loss: 0.0123
Epoch 4/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0153 - val_loss: 0.0147
Epoch 5/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0187 - val_loss: 0.0127
Epoch 6/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0134 - val_loss: 0.0088
Epoch 7/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0130 - val_loss: 0.0101
Epoch 8/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0133 - val_loss: 0.0089
Epoch 9/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0120 - val_loss: 0.0094
Epoch 10/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0120 - val_loss: 0.0090
Epoch 11/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0118 - val_loss: 0.0091
Epoch 12/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0122 - val_loss: 0.0081
Epoch 13/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0115 - val_loss: 0.0081
Epoch 14/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0110 - val_loss: 0.0078
Epoch 15/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0110 - val_loss: 0.0078
Epoch 16/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0111 - val_loss: 0.0083
Epoch 17/50
80/80 [==============================] - 2s 19ms/step - loss: 0.0111 - val_loss: 0.0080
Epoch 18/50
80/80 [==============================] - 1s 19ms/step - loss: 0.0109 - val_loss: 0.0086
Epoch 19/50
80/80 [==============================] - 2s 19ms/step - loss: 0.0106 - val_loss: 0.0077
Epoch 20/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0111 - val_loss: 0.0078
Epoch 21/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0106 - val_loss: 0.0071
Epoch 22/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0104 - val_loss: 0.0069
Epoch 23/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0100 - val_loss: 0.0073
Epoch 24/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0105 - val_loss: 0.0071
Epoch 25/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0103 - val_loss: 0.0080
Epoch 26/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0108 - val_loss: 0.0071
Epoch 27/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0102 - val_loss: 0.0072
Epoch 28/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0101 - val_loss: 0.0076
Epoch 29/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0103 - val_loss: 0.0076
Epoch 30/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0102 - val_loss: 0.0081
Epoch 31/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0102 - val_loss: 0.0074
Epoch 32/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0104 - val_loss: 0.0072
Epoch 33/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0102 - val_loss: 0.0074
Epoch 34/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0101 - val_loss: 0.0068
Epoch 35/50
80/80 [==============================] - 1s 13ms/step - loss: 0.0099 - val_loss: 0.0067
Epoch 36/50
80/80 [==============================] - 2s 19ms/step - loss: 0.0100 - val_loss: 0.0069
Epoch 37/50
80/80 [==============================] - 2s 19ms/step - loss: 0.0099 - val_loss: 0.0070
Epoch 38/50
80/80 [==============================] - 1s 15ms/step - loss: 0.0098 - val_loss: 0.0069
Epoch 39/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0099 - val_loss: 0.0081
Epoch 40/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0105 - val_loss: 0.0069
Epoch 41/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0098 - val_loss: 0.0071
Epoch 42/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0098 - val_loss: 0.0072
Epoch 43/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0099 - val_loss: 0.0068
Epoch 44/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0098 - val_loss: 0.0073
Epoch 45/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0099 - val_loss: 0.0072
Epoch 46/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0100 - val_loss: 0.0072
Epoch 47/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0098 - val_loss: 0.0069
Epoch 48/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0098 - val_loss: 0.0079
Epoch 49/50
80/80 [==============================] - 1s 10ms/step - loss: 0.0099 - val_loss: 0.0073
Epoch 50/50
80/80 [==============================] - 1s 11ms/step - loss: 0.0100 - val_loss: 0.0072
compilation time :  83.14454102516174
In [20]:
qSRNN_model.summary()
Model: "series_SRNN_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_2 (InputLayer)        [(None, 50, 1)]           0         
                                                                 
 SRNN (SimpleRNN)            (None, 60)                3720      
                                                                 
 dense_1 (Dense)             (None, 2)                 122       
                                                                 
=================================================================
Total params: 3,842
Trainable params: 3,842
Non-trainable params: 0
_________________________________________________________________
In [21]:
qSRNN_losses = pd.DataFrame(qSRNN.history)
fig = px.line(qSRNN_losses, x=qSRNN_losses.index, y=["loss", "val_loss"])
fig.update_xaxes(title_text="Epoch")
fig.update_yaxes(title_text="Loss")
fig.show()
In [22]:
qSRNN_Predict = qSRNN_model.predict(X_test)
qSRNN_Predict = scaler.inverse_transform(qSRNN_Predict)
10/10 [==============================] - 0s 5ms/step
In [23]:
# plot y_train, y_test, and testPredict using plotly
fig = go.Figure()
fig.add_trace(
    go.Scatter(
        x=df.index[seq_len : len(y_train) + seq_len],
        y=scaler.inverse_transform(y_train).ravel(),
        mode="lines",
        name="Entrenamiento",
    )
)
fig.add_trace(
    go.Scatter(
        x=df.index[len(y_train) + seq_len :],
        y=SRNN_Predict.ravel(),
        mode="lines",
        name="Predicción",
    )
)
fig.add_trace(
    go.Scatter(
        x=df.index[len(y_train) + seq_len :],
        y=qSRNN_Predict[:,0] ,
        mode="lines",
        name="0.025",
    )
)
fig.add_trace(
    go.Scatter(
        x=df.index[len(y_train) + seq_len :],
        y=qSRNN_Predict[:,1] ,
        mode="lines",
        name="0.975",
    )
)
fig.update_xaxes(title_text="Fecha")
fig.update_yaxes(title_text="ValorObservado")
fig.show()

Error CuadrƔtico Medio (1 paso adelante)

In [24]:
SRNN_Score1 = metrics.mean_squared_error(scaler.inverse_transform(y_test), SRNN_Predict)
print('Test Score: %.2f RMSE' % (SRNN_Score1))
Test Score: 0.72 RMSE

Predicción 5 pasos adelante

In [25]:
def create_dataset(X, y, time_steps=1):
    # crea dos listas vacias para depositar los datos
    Xs, ys = [], []
    # el primer lote de datos empieza en la primera observación
    # y toma time_steps  datos.
    # Comienza a avanzar hacia adelante.
    for i in range(len(X) - time_steps-5):
        v = X.iloc[i:(i + time_steps)].values
        Xs.append(v)
        ys.append(y.iloc[i + time_steps+5])
    return np.array(Xs), np.array(ys)
In [26]:
time_steps = 50

# reshape to [samples, time_steps, n_features]

X_train_5p, y_train_5p = create_dataset(train, train, time_steps)
X_test_5p, y_test_5p = create_dataset(test, test, time_steps)

print("X_train.shape = ", X_train_5p.shape)
print("y_train.shape = ", y_train_5p.shape)
print("X_test.shape = ", X_test_5p.shape)
print("y_test.shape = ", y_test_5p.shape)
X_train.shape =  (1405, 50, 1)
y_train.shape =  (1405, 1)
X_test.shape =  (311, 50, 1)
y_test.shape =  (311, 1)
In [27]:
# shapes
inputs_shape = (X_train_5p.shape[1], X_train_5p.shape[2])
SRNN_output_5p = 60

# layers
inputs_5p = L.Input(inputs_shape)
srnn_5p = L.SimpleRNN(units=SRNN_output_5p, name='SRNN')(inputs_5p)
outputs_SRNN_5p = L.Dense(1)(srnn_5p)

SRNN_model_5p = Model(inputs=inputs_5p, outputs=outputs_SRNN_5p, name='series_SRNN_model')

# Compiling the RNN
SRNN_model_5p.compile(optimizer=Adam(0.001),loss="mean_squared_error")
# Fitting to the training set
start = time.time()
SRNN_5p = SRNN_model_5p.fit(
    X_train_5p,
    y_train_5p,
    epochs=50,
    batch_size=16,
    validation_split=0.1,
    verbose=1,
    shuffle=False
)
print("compilation time : ", time.time() - start)
Epoch 1/50
79/79 [==============================] - 2s 12ms/step - loss: 0.0204 - val_loss: 0.0445
Epoch 2/50
79/79 [==============================] - 1s 9ms/step - loss: 0.0255 - val_loss: 0.0068
Epoch 3/50
79/79 [==============================] - 1s 10ms/step - loss: 0.0167 - val_loss: 0.0054
Epoch 4/50
79/79 [==============================] - 1s 9ms/step - loss: 0.0132 - val_loss: 0.0052
Epoch 5/50
79/79 [==============================] - 1s 10ms/step - loss: 0.0130 - val_loss: 0.0052
Epoch 6/50
79/79 [==============================] - 1s 10ms/step - loss: 0.0127 - val_loss: 0.0051
Epoch 7/50
79/79 [==============================] - 1s 9ms/step - loss: 0.0126 - val_loss: 0.0051
Epoch 8/50
79/79 [==============================] - 1s 10ms/step - loss: 0.0124 - val_loss: 0.0051
Epoch 9/50
79/79 [==============================] - 1s 9ms/step - loss: 0.0123 - val_loss: 0.0051
Epoch 10/50
79/79 [==============================] - 1s 10ms/step - loss: 0.0122 - val_loss: 0.0051
Epoch 11/50
79/79 [==============================] - 1s 9ms/step - loss: 0.0121 - val_loss: 0.0051
Epoch 12/50
79/79 [==============================] - 1s 9ms/step - loss: 0.0121 - val_loss: 0.0051
Epoch 13/50
79/79 [==============================] - 1s 10ms/step - loss: 0.0120 - val_loss: 0.0052
Epoch 14/50
79/79 [==============================] - 1s 10ms/step - loss: 0.0120 - val_loss: 0.0052
Epoch 15/50
79/79 [==============================] - 1s 10ms/step - loss: 0.0120 - val_loss: 0.0052
Epoch 16/50
79/79 [==============================] - 1s 9ms/step - loss: 0.0119 - val_loss: 0.0052
Epoch 17/50
79/79 [==============================] - 1s 9ms/step - loss: 0.0119 - val_loss: 0.0052
Epoch 18/50
79/79 [==============================] - 1s 9ms/step - loss: 0.0119 - val_loss: 0.0052
Epoch 19/50
79/79 [==============================] - 1s 9ms/step - loss: 0.0119 - val_loss: 0.0053
Epoch 20/50
79/79 [==============================] - 1s 9ms/step - loss: 0.0118 - val_loss: 0.0053
Epoch 21/50
79/79 [==============================] - 1s 10ms/step - loss: 0.0118 - val_loss: 0.0053
Epoch 22/50
79/79 [==============================] - 1s 10ms/step - loss: 0.0118 - val_loss: 0.0053
Epoch 23/50
79/79 [==============================] - 1s 10ms/step - loss: 0.0118 - val_loss: 0.0053
Epoch 24/50
79/79 [==============================] - 1s 9ms/step - loss: 0.0118 - val_loss: 0.0053
Epoch 25/50
79/79 [==============================] - 1s 10ms/step - loss: 0.0118 - val_loss: 0.0053
Epoch 26/50
79/79 [==============================] - 1s 9ms/step - loss: 0.0117 - val_loss: 0.0053
Epoch 27/50
79/79 [==============================] - 1s 10ms/step - loss: 0.0117 - val_loss: 0.0053
Epoch 28/50
79/79 [==============================] - 1s 10ms/step - loss: 0.0117 - val_loss: 0.0053
Epoch 29/50
79/79 [==============================] - 1s 9ms/step - loss: 0.0117 - val_loss: 0.0053
Epoch 30/50
79/79 [==============================] - 1s 10ms/step - loss: 0.0117 - val_loss: 0.0053
Epoch 31/50
79/79 [==============================] - 1s 10ms/step - loss: 0.0117 - val_loss: 0.0053
Epoch 32/50
79/79 [==============================] - 1s 10ms/step - loss: 0.0117 - val_loss: 0.0053
Epoch 33/50
79/79 [==============================] - 1s 10ms/step - loss: 0.0117 - val_loss: 0.0053
Epoch 34/50
79/79 [==============================] - 1s 10ms/step - loss: 0.0117 - val_loss: 0.0053
Epoch 35/50
79/79 [==============================] - 1s 10ms/step - loss: 0.0117 - val_loss: 0.0054
Epoch 36/50
79/79 [==============================] - 1s 10ms/step - loss: 0.0117 - val_loss: 0.0054
Epoch 37/50
79/79 [==============================] - 1s 10ms/step - loss: 0.0117 - val_loss: 0.0054
Epoch 38/50
79/79 [==============================] - 1s 10ms/step - loss: 0.0117 - val_loss: 0.0054
Epoch 39/50
79/79 [==============================] - 1s 10ms/step - loss: 0.0117 - val_loss: 0.0054
Epoch 40/50
79/79 [==============================] - 1s 10ms/step - loss: 0.0117 - val_loss: 0.0054
Epoch 41/50
79/79 [==============================] - 1s 10ms/step - loss: 0.0117 - val_loss: 0.0054
Epoch 42/50
79/79 [==============================] - 1s 10ms/step - loss: 0.0117 - val_loss: 0.0055
Epoch 43/50
79/79 [==============================] - 1s 10ms/step - loss: 0.0117 - val_loss: 0.0055
Epoch 44/50
79/79 [==============================] - 1s 10ms/step - loss: 0.0117 - val_loss: 0.0055
Epoch 45/50
79/79 [==============================] - 1s 10ms/step - loss: 0.0117 - val_loss: 0.0055
Epoch 46/50
79/79 [==============================] - 1s 10ms/step - loss: 0.0116 - val_loss: 0.0055
Epoch 47/50
79/79 [==============================] - 1s 12ms/step - loss: 0.0117 - val_loss: 0.0055
Epoch 48/50
79/79 [==============================] - 1s 15ms/step - loss: 0.0116 - val_loss: 0.0055
Epoch 49/50
79/79 [==============================] - 1s 16ms/step - loss: 0.0117 - val_loss: 0.0055
Epoch 50/50
79/79 [==============================] - 1s 16ms/step - loss: 0.0116 - val_loss: 0.0054
compilation time :  41.07307744026184
In [28]:
joblib.dump(SRNN_model_5p,'/content/drive/Shareddrives/Mineria /SRNN_r')
Out[28]:
['/content/drive/Shareddrives/Mineria /SRNN_r']
In [29]:
SRNN_model_5p.summary()
Model: "series_SRNN_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_3 (InputLayer)        [(None, 50, 1)]           0         
                                                                 
 SRNN (SimpleRNN)            (None, 60)                3720      
                                                                 
 dense_2 (Dense)             (None, 1)                 61        
                                                                 
=================================================================
Total params: 3,781
Trainable params: 3,781
Non-trainable params: 0
_________________________________________________________________
In [30]:
SRNN_losses_5p = pd.DataFrame(SRNN_5p.history)
fig = px.line(SRNN_losses_5p, x=SRNN_losses_5p.index, y=["loss", "val_loss"])
fig.update_xaxes(title_text="Epoch")
fig.update_yaxes(title_text="Loss")
fig.show()
In [31]:
SRNN_Predict_5p = SRNN_model.predict(X_test_5p )
SRNN_Predict_5p = scaler.inverse_transform(SRNN_Predict_5p)
10/10 [==============================] - 0s 6ms/step
In [32]:
# plot y_train, y_test, and testPredict using plotly
seq_len = 50
fig = go.Figure()
fig.add_trace(
    go.Scatter(
        x=df.index[seq_len : len(y_train_5p) + seq_len],
        y=scaler.inverse_transform(y_train_5p).ravel(),
        mode="lines",
        name="Entrenamiento",
    )
)
fig.add_trace(
    go.Scatter(
        x=df.index[len(y_train_5p) + seq_len :],
        y=scaler.inverse_transform(y_test_5p).ravel(),
        mode="lines",
        name="Prueba",
    )
)
fig.add_trace(
    go.Scatter(
        x=df.index[len(y_train_5p) + seq_len :],
        y=SRNN_Predict_5p.ravel(),
        mode="lines",
        name="Predicción",
    )
)
fig.update_xaxes(title_text="Fecha")
fig.update_yaxes(title_text="ValorObservado")
fig.show()
In [33]:
SRNN_Score2 = metrics.mean_squared_error(scaler.inverse_transform(y_test_5p), SRNN_Predict_5p)
print('Test Score: %.2f RMSE' % (SRNN_Score2))
Test Score: 1.56 RMSE

Tomando mƔs retardos

In [34]:
def create_dataset(X, y, time_steps=1):
    # crea dos listas vacias para depositar los datos
    Xs, ys = [], []
    # el primer lote de datos empieza en la primera observación
    # y toma time_steps  datos.
    # Comienza a avanzar hacia adelante.
    for i in range(len(X) - time_steps):
        v = X.iloc[i:(i + time_steps)].values
        Xs.append(v)
        ys.append(y.iloc[i + time_steps])
    return np.array(Xs), np.array(ys)
In [35]:
time_steps1 = 100

# reshape to [samples, time_steps, n_features]

X_train_r, y_train_r = create_dataset(train, train, time_steps1)
X_test_r, y_test_r = create_dataset(test, test, time_steps1)

print("X_train.shape = ", X_train_r.shape)
print("y_train.shape = ", y_train_r.shape)
print("X_test.shape = ", X_test_r.shape)
print("y_test.shape = ", y_test_r.shape)
X_train.shape =  (1360, 100, 1)
y_train.shape =  (1360, 1)
X_test.shape =  (266, 100, 1)
y_test.shape =  (266, 1)
In [36]:
# shapes
inputs_shape = (X_train_r.shape[1], X_train_r.shape[2])
SRNN_output_r = 60

# layers
inputs_r = L.Input(inputs_shape)
srnn_r = L.SimpleRNN(units=SRNN_output_r, name='SRNN')(inputs_r)
outputs_SRNN_r = L.Dense(1)(srnn_r)

SRNN_model_r = Model(inputs=inputs_r, outputs=outputs_SRNN_r, name='series_SRNN_model')

# Compiling the RNN
SRNN_model_r.compile(optimizer=Adam(0.001),loss="mean_squared_error")
# Fitting to the training set
start = time.time()
SRNN_r = SRNN_model_r.fit(
    X_train_r,
    y_train_r,
    epochs=50,
    batch_size=16,
    validation_split=0.1,
    verbose=1,
    shuffle=False
)
print("compilation time : ", time.time() - start)
Epoch 1/50
77/77 [==============================] - 3s 21ms/step - loss: 0.0177 - val_loss: 0.0180
Epoch 2/50
77/77 [==============================] - 1s 18ms/step - loss: 0.0135 - val_loss: 0.0064
Epoch 3/50
77/77 [==============================] - 1s 18ms/step - loss: 0.0098 - val_loss: 0.0039
Epoch 4/50
77/77 [==============================] - 1s 19ms/step - loss: 0.0073 - val_loss: 0.0041
Epoch 5/50
77/77 [==============================] - 1s 18ms/step - loss: 0.0069 - val_loss: 0.0039
Epoch 6/50
77/77 [==============================] - 1s 18ms/step - loss: 0.0066 - val_loss: 0.0038
Epoch 7/50
77/77 [==============================] - 1s 19ms/step - loss: 0.0063 - val_loss: 0.0038
Epoch 8/50
77/77 [==============================] - 1s 18ms/step - loss: 0.0061 - val_loss: 0.0038
Epoch 9/50
77/77 [==============================] - 1s 19ms/step - loss: 0.0060 - val_loss: 0.0038
Epoch 10/50
77/77 [==============================] - 1s 18ms/step - loss: 0.0059 - val_loss: 0.0037
Epoch 11/50
77/77 [==============================] - 1s 19ms/step - loss: 0.0058 - val_loss: 0.0037
Epoch 12/50
77/77 [==============================] - 1s 19ms/step - loss: 0.0058 - val_loss: 0.0037
Epoch 13/50
77/77 [==============================] - 1s 19ms/step - loss: 0.0058 - val_loss: 0.0037
Epoch 14/50
77/77 [==============================] - 2s 20ms/step - loss: 0.0057 - val_loss: 0.0036
Epoch 15/50
77/77 [==============================] - 1s 18ms/step - loss: 0.0057 - val_loss: 0.0036
Epoch 16/50
77/77 [==============================] - 1s 18ms/step - loss: 0.0057 - val_loss: 0.0036
Epoch 17/50
77/77 [==============================] - 1s 18ms/step - loss: 0.0057 - val_loss: 0.0036
Epoch 18/50
77/77 [==============================] - 1s 18ms/step - loss: 0.0057 - val_loss: 0.0035
Epoch 19/50
77/77 [==============================] - 1s 18ms/step - loss: 0.0057 - val_loss: 0.0035
Epoch 20/50
77/77 [==============================] - 1s 18ms/step - loss: 0.0057 - val_loss: 0.0035
Epoch 21/50
77/77 [==============================] - 1s 19ms/step - loss: 0.0057 - val_loss: 0.0035
Epoch 22/50
77/77 [==============================] - 1s 19ms/step - loss: 0.0057 - val_loss: 0.0035
Epoch 23/50
77/77 [==============================] - 1s 18ms/step - loss: 0.0057 - val_loss: 0.0035
Epoch 24/50
77/77 [==============================] - 1s 19ms/step - loss: 0.0057 - val_loss: 0.0035
Epoch 25/50
77/77 [==============================] - 1s 18ms/step - loss: 0.0057 - val_loss: 0.0035
Epoch 26/50
77/77 [==============================] - 2s 26ms/step - loss: 0.0057 - val_loss: 0.0035
Epoch 27/50
77/77 [==============================] - 3s 33ms/step - loss: 0.0057 - val_loss: 0.0035
Epoch 28/50
77/77 [==============================] - 2s 23ms/step - loss: 0.0057 - val_loss: 0.0035
Epoch 29/50
77/77 [==============================] - 1s 18ms/step - loss: 0.0057 - val_loss: 0.0034
Epoch 30/50
77/77 [==============================] - 1s 18ms/step - loss: 0.0057 - val_loss: 0.0035
Epoch 31/50
77/77 [==============================] - 1s 18ms/step - loss: 0.0057 - val_loss: 0.0034
Epoch 32/50
77/77 [==============================] - 1s 18ms/step - loss: 0.0056 - val_loss: 0.0035
Epoch 33/50
77/77 [==============================] - 1s 18ms/step - loss: 0.0056 - val_loss: 0.0034
Epoch 34/50
77/77 [==============================] - 1s 18ms/step - loss: 0.0057 - val_loss: 0.0034
Epoch 35/50
77/77 [==============================] - 1s 18ms/step - loss: 0.0057 - val_loss: 0.0035
Epoch 36/50
77/77 [==============================] - 1s 18ms/step - loss: 0.0057 - val_loss: 0.0034
Epoch 37/50
77/77 [==============================] - 1s 18ms/step - loss: 0.0056 - val_loss: 0.0034
Epoch 38/50
77/77 [==============================] - 2s 19ms/step - loss: 0.0056 - val_loss: 0.0033
Epoch 39/50
77/77 [==============================] - 1s 18ms/step - loss: 0.0057 - val_loss: 0.0034
Epoch 40/50
77/77 [==============================] - 1s 17ms/step - loss: 0.0056 - val_loss: 0.0033
Epoch 41/50
77/77 [==============================] - 1s 18ms/step - loss: 0.0056 - val_loss: 0.0034
Epoch 42/50
77/77 [==============================] - 1s 18ms/step - loss: 0.0056 - val_loss: 0.0034
Epoch 43/50
77/77 [==============================] - 1s 18ms/step - loss: 0.0057 - val_loss: 0.0034
Epoch 44/50
77/77 [==============================] - 1s 18ms/step - loss: 0.0057 - val_loss: 0.0034
Epoch 45/50
77/77 [==============================] - 1s 19ms/step - loss: 0.0057 - val_loss: 0.0034
Epoch 46/50
77/77 [==============================] - 1s 18ms/step - loss: 0.0056 - val_loss: 0.0034
Epoch 47/50
77/77 [==============================] - 1s 18ms/step - loss: 0.0056 - val_loss: 0.0034
Epoch 48/50
77/77 [==============================] - 1s 18ms/step - loss: 0.0056 - val_loss: 0.0034
Epoch 49/50
77/77 [==============================] - 1s 18ms/step - loss: 0.0056 - val_loss: 0.0034
Epoch 50/50
77/77 [==============================] - 1s 18ms/step - loss: 0.0057 - val_loss: 0.0034
compilation time :  74.04706740379333
In [37]:
joblib.dump(SRNN_model_5p,'/content/drive/Shareddrives/Mineria /SRNN_r')
Out[37]:
['/content/drive/Shareddrives/Mineria /SRNN_r']
In [38]:
SRNN_model_r.summary()
Model: "series_SRNN_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_4 (InputLayer)        [(None, 100, 1)]          0         
                                                                 
 SRNN (SimpleRNN)            (None, 60)                3720      
                                                                 
 dense_3 (Dense)             (None, 1)                 61        
                                                                 
=================================================================
Total params: 3,781
Trainable params: 3,781
Non-trainable params: 0
_________________________________________________________________
In [39]:
SRNN_losses_r = pd.DataFrame(SRNN_r.history)
fig = px.line(SRNN_losses_r, x=SRNN_losses_r.index, y=["loss", "val_loss"])
fig.update_xaxes(title_text="Epoch")
fig.update_yaxes(title_text="Loss")
fig.show()
In [40]:
SRNN_Predict_r = SRNN_model_r.predict(X_test_r)
SRNN_Predict_r = scaler.inverse_transform(SRNN_Predict_r)
9/9 [==============================] - 0s 8ms/step
In [41]:
# plot y_train, y_test, and testPredict using plotly
seq_len1=100
fig = go.Figure()
fig.add_trace(
    go.Scatter(
        x=df.index[seq_len1 : len(y_train_r) + seq_len1],
        y=scaler.inverse_transform(y_train_r).ravel(),
        mode="lines",
        name="Entrenamiento",
    )
)
fig.add_trace(
    go.Scatter(
        x=df.index[len(y_train_r) + seq_len1 :],
        y=scaler.inverse_transform(y_test_r).ravel(),
        mode="lines",
        name="Prueba",
    )
)
fig.add_trace(
    go.Scatter(
        x=df.index[len(y_train_r) + seq_len1 :],
        y=SRNN_Predict_r.ravel(),
        mode="lines",
        name="Predicción",
    )
)
fig.update_xaxes(title_text="Fecha")
fig.update_yaxes(title_text="ValorObservado")
fig.show()
In [42]:
SRNN_Score3 = metrics.mean_squared_error(scaler.inverse_transform(y_test_r), SRNN_Predict_r)
print('Test Score: %.2f RMSE' % (SRNN_Score3))
Test Score: 0.70 RMSE

Comparación

In [43]:
print('Modelo SRNN 1 paso adelante 50 retardos %f' % (SRNN_Score1))
print('Modelo SRNN 5 paso adelante 50 retardos %f' % (SRNN_Score2))
print('Modelo SRNN 1 paso adelante 100 retardos %f' % (SRNN_Score3))
Modelo SRNN 1 paso adelante 50 retardos 0.723050
Modelo SRNN 5 paso adelante 50 retardos 1.555407
Modelo SRNN 1 paso adelante 100 retardos 0.695902